Low-delay buffering model in video coding

ABSTRACT

Techniques for low-delay buffering in a video coding process are disclosed. Video decoding techniques may include receiving a first decoded picture buffer (DPB) output delay and a second DPB output delay for a decoded picture, determining, for the decoded picture, a first DPB output time using the first DPB output delay in the case a hypothetical reference decoder (HRD) setting for a video decoder indicates operation at a picture level, and determining, for the decoded picture, a second DPB output time using the second DPB output delay in the case that the HRD setting for the video decoder indicates operation at a sub-picture level.

RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional Application No.61/739,632 filed Dec. 19, 2012, and U.S. Provisional Application No.61/745,423, filed Dec. 21, 2012, the entire content of both which isincorporated by reference herein.

TECHNICAL FIELD

This disclosure relates to video coding, and more particularly totechniques for low-delay buffering in a video coding process.

BACKGROUND

Digital video capabilities can be incorporated into a wide range ofdevices, including digital televisions, digital direct broadcastsystems, wireless broadcast systems, personal digital assistants (PDAs),laptop or desktop computers, tablet computers, e-book readers, digitalcameras, digital recording devices, digital media players, video gamingdevices, video game consoles, cellular or satellite radio telephones,so-called “smart phones,” video teleconferencing devices, videostreaming devices, and the like. Digital video devices implement videocompression techniques, such as those described in the standards definedby MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264/MPEG-4, Part 10, AdvancedVideo Coding (AVC), the High Efficiency Video Coding (HEVC) standardpresently under development, and extensions of such standards. The videodevices may transmit, receive, encode, decode, and/or store digitalvideo information more efficiently by implementing such videocompression techniques.

Video compression techniques perform spatial (intra-picture) predictionand/or temporal (inter-picture) prediction to reduce or removeredundancy inherent in video sequences. For block-based video coding, avideo slice (i.e., a video frame or a portion of a video frame) may bepartitioned into video blocks, which may also be referred to astreeblocks, coding units (CUs) and/or coding nodes. Video blocks in anintra-coded (I) slice of a picture are encoded using spatial predictionwith respect to reference samples in neighboring blocks in the samepicture. Video blocks in an inter-coded (P or B) slice of a picture mayuse spatial prediction with respect to reference samples in neighboringblocks in the same picture or temporal prediction with respect toreference samples in other reference pictures. Pictures may be referredto as frames, and reference pictures may be referred to a referenceframes.

Spatial or temporal prediction utilizes a predictive block. Residualdata represents pixel differences between the original block to be codedand the predictive block. An inter-coded block is encoded according to amotion vector that points to a block of reference samples forming thepredictive block, and the residual data indicating the differencebetween the coded block and the predictive block. An intra-coded blockis encoded according to an intra-coding mode and the residual data. Forfurther compression, the residual data may be transformed from the pixeldomain to a transform domain, resulting in residual transformcoefficients, which then may be quantized. The quantized transformcoefficients, initially arranged in a two-dimensional array, may bescanned in order to produce a one-dimensional vector of transformcoefficients, and entropy coding may be applied to achieve even morecompression.

SUMMARY

In general, this disclosure describes techniques for video coding, andmore particularly techniques for low-delay buffering in a video codingprocess. In one or more examples, this disclosure proposes techniquesfor signaling decoded picture buffer (DPB) output delays to be used whena video decoder is operating at a sub-picture level, so as to improvevideo buffer delay.

In one example of the disclosure, a method of decoding video comprisesreceiving a first decoded picture buffer (DPB) output delay and a secondDPB output delay for a decoded picture, determining, for the decodedpicture, a first DPB output time using the first DPB output delay in thecase a hypothetical reference decoder (HRD) setting for a video decoderindicates operation at a picture level, and determining, for the decodedpicture, a second DPB output time using the second DPB output delay inthe case that the HRD setting for the video decoder indicates operationat a sub-picture level.

In another example of the disclosure, a method of encoding videocomprises determining a first DPB output time using a first DPB outputdelay in the case an HRD setting for a video decoder indicates operationat a picture level, determining a second DPB output time using a secondDPB output delay in the case that the HRD setting for the video decoderindicates operation at a sub-picture level, and signaling the firstdecoded picture buffer (DPB) output delay and the second DPB outputdelay.

In another example of the disclosure, an apparatus configured to decodevideo data comprises a video decoder configured to receive a first DPBoutput delay and a second DPB output delay for a decoded picture,determine, for the decoded picture, a first DPB output time using thefirst DPB output delay in the case an HRD setting for a video decoderindicates operation at a picture level, and determine, for the decodedpicture, a second DPB output time using the second DPB output delay inthe case that the HRD setting for the video decoder indicates operationat a sub-picture level.

In another example of the disclosure, an apparatus configured to encodevideo data comprises a video encoder configured to determine a first DPBoutput time using a first DPB output delay in the case an HRD settingfor a video decoder indicates operation at a picture level, determine asecond DPB output time using a second DPB output delay in the case thatthe HRD setting for the video decoder indicates operation at asub-picture level, and signal the first DPB output delay and the secondDPB output delay.

In another example of the disclosure, an apparatus configured to decodevideo data comprises means for receiving a first DPB output delay and asecond DPB output delay for a decoded picture, means for determining,for the decoded picture, a first DPB output time using the first DPBoutput delay in the case an HRD setting for a video decoder indicatesoperation at a picture level, and means for determining, for the decodedpicture, a second DPB output time using the second DPB output delay inthe case that the HRD setting for the video decoder indicates operationat a sub-picture level.

In another example of the disclosure, an apparatus configured to encodevideo data comprises means for determining a first DPB output time usinga first DPB output delay in the case an HRD setting for a video decoderindicates operation at a picture level, means for determining a secondDPB output time using a second DPB output delay in the case that the HRDsetting for the video decoder indicates operation at a sub-picturelevel, means for signaling the first DPB output delay and the second DPBoutput delay.

In another example, this disclosure discloses a computer-readablestorage medium storing instructions that, when executed, cause one ormore processors of a device configured to decode video data to receive afirst DPB output delay and a second DPB output delay for a decodedpicture, determine, for the decoded picture, a first DPB output timeusing the first DPB output delay in the case an HRD setting for a videodecoder indicates operation at a picture level, and determine, for thedecoded picture, a second DPB output time using the second DPB outputdelay in the case that the HRD setting for the video decoder indicatesoperation at a sub-picture level.

In another example, this disclosure discloses a computer-readablestorage medium storing instructions that, when executed, cause one ormore processors of a device configured to encode video data to determinea first DPB output time using a first DPB output delay in the case anHRD setting for a video decoder indicates operation at a picture level,determine a second DPB output time using a second DPB output delay inthe case that the HRD setting for the video decoder indicates operationat a sub-picture level, and signal the first DPB output delay and thesecond DPB output delay.

The details of one or more examples are set forth in the accompanyingdrawings and the description below. Other features, objects, andadvantages will be apparent from the description and drawings, and fromthe claims.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1 is a block diagram illustrating an example video encoding anddecoding system that may utilize the techniques described in thisdisclosure.

FIG. 2 is a block diagram illustrating a buffer model for a hypotheticalreference decoder (HRD).

FIG. 3 is a block diagram illustrating an example video encoder that mayimplement the techniques described in this disclosure.

FIG. 4 is a block diagram illustrating an example video decoder that mayimplement the techniques described in this disclosure.

FIG. 5 is a flowchart showing an example encoding method according tothe techniques of this disclosure.

FIG. 6 is a flowchart showing an example decoding method according tothe techniques of this disclosure.

DETAILED DESCRIPTION

This disclosure describes various methods and techniques to achievereduced codec (coder/decoder) delay in an interoperable manner, througha generic sub-picture based hypothetical reference decoder (HRD) modelthat includes both sub-picture based coded picture buffer (CPB)operations and sub-picture timing based decoded picture buffer (DPB)operations.

Current approaches to minimizing CPB and/or DPB delay time exhibit thefollowing drawbacks. The output time of a decoded picture is equal tothe decoding time (i.e., CPB removal time) of the last decoding unit(i.e., the access unit itself for access unit-level operation) plus thesignaled DPB output delay. Thus, two approaches to reduce the delay aregenerally used. One is to shift the decoding time earlier. The other isto reduce the value of the signaled DPB output delay (relative to theCPB removal time). However, existing solutions for an ultra-low delaybuffering model only involve sub-picture based CPB operations, and onlytake advantage of the first approach to reduce the delay.

In view of these drawbacks, this disclosure proposes techniques forfurther reducing decoding delay through the signaling and use of reducedvalues of the signaled DPB output delay relative to the CPB removaltime.

FIG. 1 is a block diagram illustrating an example video encoding anddecoding system 10 that may utilize the techniques described in thisdisclosure. As shown in FIG. 1, system 10 includes a source device 12that generates encoded video data to be decoded at a later time by adestination device 14. Source device 12 and destination device 14 maycomprise any of a wide range of devices, including desktop computers,notebook (i.e., laptop) computers, tablet computers, set-top boxes,telephone handsets such as so-called “smart” phones, so-called “smart”pads, televisions, cameras, display devices, digital media players,video gaming consoles, video streaming device, or the like. In somecases, source device 12 and destination device 14 may be equipped forwireless communication.

Destination device 14 may receive the encoded video data to be decodedvia a link 16. Link 16 may comprise any type of medium or device capableof moving the encoded video data from source device 12 to destinationdevice 14. In one example, link 16 may comprise a communication mediumto enable source device 12 to transmit encoded video data directly todestination device 14 in real-time. The encoded video data may bemodulated according to a communication standard, such as a wirelesscommunication protocol, and transmitted to destination device 14. Thecommunication medium may comprise any wireless or wired communicationmedium, such as a radio frequency (RF) spectrum or one or more physicaltransmission lines. The communication medium may form part of apacket-based network, such as a local area network, a wide-area network,or a global network such as the Internet. The communication medium mayinclude routers, switches, base stations, or any other equipment thatmay be useful to facilitate communication from source device 12 todestination device 14.

Alternatively, encoded data may be output from output interface 22 to astorage device 33. Similarly, encoded data may be accessed from storagedevice 33 by input interface. Storage device 33 may include any of avariety of distributed or locally accessed data storage media such as ahard drive, Blu-ray discs, DVDs, CD-ROMs, flash memory, volatile ornon-volatile memory, or any other suitable digital storage media forstoring encoded video data. In a further example, storage device 33 maycorrespond to a file server or another intermediate storage device thatmay hold the encoded video generated by source device 12. Destinationdevice 14 may access stored video data from storage device 33 viastreaming or download. The file server may be any type of server capableof storing encoded video data and transmitting that encoded video datato the destination device 14. Example file servers include a web server(e.g., for a website), an FTP server, network attached storage (NAS)devices, or a local disk drive. Destination device 14 may access theencoded video data through any standard data connection, including anInternet connection. This may include a wireless channel (e.g., a Wi-Ficonnection), a wired connection (e.g., DSL, cable modem, etc.), or acombination of both that is suitable for accessing encoded video datastored on a file server. The transmission of encoded video data fromstorage device 33 may be a streaming transmission, a downloadtransmission, or a combination of both.

The techniques of this disclosure are not necessarily limited towireless applications or settings. The techniques may be applied tovideo coding in support of any of a variety of multimedia applications,such as over-the-air television broadcasts, cable televisiontransmissions, satellite television transmissions, streaming videotransmissions, e.g., via the Internet, encoding of digital video forstorage on a data storage medium, decoding of digital video stored on adata storage medium, or other applications. In some examples, system 10may be configured to support one-way or two-way video transmission tosupport applications such as video streaming, video playback, videobroadcasting, and/or video telephony.

In the example of FIG. 1, source device 12 includes a video source 18,video encoder 20 and an output interface 22. In some cases, outputinterface 22 may include a modulator/demodulator (modem) and/or atransmitter. In source device 12, video source 18 may include a sourcesuch as a video capture device, e.g., a video camera, a video archivecontaining previously captured video, a video feed interface to receivevideo from a video content provider, and/or a computer graphics systemfor generating computer graphics data as the source video, or acombination of such sources. As one example, if video source 18 is avideo camera, source device 12 and destination device 14 may formso-called camera phones or video phones. However, the techniquesdescribed in this disclosure may be applicable to video coding ingeneral, and may be applied to wireless and/or wired applications.

The captured, pre-captured, or computer-generated video may be encodedby video encoder 20. The encoded video data may be transmitted directlyto destination device 14 via output interface 22 of source device 12.The encoded video data may also (or alternatively) be stored ontostorage device 33 for later access by destination device 14 or otherdevices, for decoding and/or playback.

Destination device 14 includes an input interface 28, a video decoder30, and a display device 32. In some cases, input interface 28 mayinclude a receiver and/or a modem. Input interface 28 of destinationdevice 14 receives the encoded video data over link 16. The encodedvideo data communicated over link 16, or provided on storage device 33,may include a variety of syntax elements generated by video encoder 20for use by a video decoder, such as video decoder 30, in decoding thevideo data. Such syntax elements may be included with the encoded videodata transmitted on a communication medium, stored on a storage medium,or stored a file server.

Display device 32 may be integrated with, or external to, destinationdevice 14. In some examples, destination device 14 may include anintegrated display device and also be configured to interface with anexternal display device. In other examples, destination device 14 may bea display device. In general, display device 32 displays the decodedvideo data to a user, and may comprise any of a variety of displaydevices such as a liquid crystal display (LCD), a plasma display, anorganic light emitting diode (OLED) display, or another type of displaydevice.

Video encoder 20 and video decoder 30 may operate according to a videocompression standard, such as the High Efficiency Video Coding (HEVC)standard presently under development, and may conform to the HEVC TestModel (HM). HEVC is being developed by the Joint Collaboration Team onVideo Coding (JCT-VC) of ITU-T Video Coding Experts Group (VCEG) andISO/IEC Motion Picture Experts Group (MPEG). One Working Draft (WD) ofHEVC, Bross, et al., “High Efficiency Video Coding (HEVC) textspecification draft 9,” and referred to as HEVC WD9 hereinafter, isavailable, as of Jul. 5, 2013, fromhttp://phenix.int-evry.fr/jct/doc_end_user/documents/11_Shanghai/wg11/JCTVC-K1003-v13.zip.The entire content of HEVC WD9 is incorporated by reference herein.

Alternatively, video encoder 20 and video decoder 30 may operateaccording to other proprietary or industry standards, such as the ITU-TH.264 standard, alternatively referred to as MPEG-4, Part 10, AdvancedVideo Coding (AVC), or extensions of such standards. The techniques ofthis disclosure, however, are not limited to any particular codingstandard. Other examples of video compression standards include MPEG-2and ITU-T H.263.

Although not shown in FIG. 1, in some aspects, video encoder 20 andvideo decoder 30 may each be integrated with an audio encoder anddecoder, and may include appropriate MUX-DEMUX units, or other hardwareand software, to handle encoding of both audio and video in a commondata stream or separate data streams. If applicable, in some examples,MUX-DEMUX units may conform to the ITU H.223 multiplexer protocol, orother protocols such as the user datagram protocol (UDP).

Video encoder 20 and video decoder 30 each may be implemented as any ofa variety of suitable encoder circuitry, such as one or moremicroprocessors, digital signal processors (DSPs), application specificintegrated circuits (ASICs), field programmable gate arrays (FPGAs),discrete logic, software, hardware, firmware or any combinationsthereof. When the techniques are implemented partially in software, adevice may store instructions for the software in a suitable,non-transitory computer-readable medium and execute the instructions inhardware using one or more processors to perform the techniques of thisdisclosure. Each of video encoder 20 and video decoder 30 may beincluded in one or more encoders or decoders, either of which may beintegrated as part of a combined encoder/decoder (CODEC) in a respectivedevice.

The JCT-VC is working on development of the HEVC standard. The HEVCstandardization efforts are based on an evolving model of a video codingdevice referred to as the HEVC Test Model (HM). The HM presumes severaladditional capabilities of video coding devices relative to existingdevices according to, e.g., ITU-T H.264/AVC. For example, whereas H.264provides nine intra-prediction encoding modes, the HM may provide asmany as thirty-three intra-prediction encoding modes.

In general, the working model of the HM describes that a video frame orpicture may be divided into a sequence of treeblocks or largest codingunits (LCU) that include both luma and chroma samples. A treeblock has asimilar purpose as a macroblock of the H.264 standard. A slice includesa number of consecutive treeblocks in coding order. A video frame orpicture may be partitioned into one or more slices. Each treeblock maybe split into coding units (CUs) according to a quadtree. For example, atreeblock, as a root node of the quadtree, may be split into four childnodes, and each child node may in turn be a parent node and be splitinto another four child nodes. A final, unsplit child node, as a leafnode of the quadtree, comprises a coding node, i.e., a coded videoblock. Syntax data associated with a coded bitstream may define amaximum number of times a treeblock may be split, and may also define aminimum size of the coding nodes.

A CU includes a coding node and prediction units (PUs) and transformunits (TUs) associated with the coding node. A size of the CU generallycorresponds to a size of the coding node and must typically be square inshape. The size of the CU may range from 8×8 pixels up to the size ofthe treeblock with a maximum of 64×64 pixels or greater. Each CU maycontain one or more PUs and one or more TUs. Syntax data associated witha CU may describe, for example, partitioning of the CU into one or morePUs. Partitioning modes may differ between whether the CU is skip ordirect mode encoded, intra-prediction mode encoded, or inter-predictionmode encoded. PUs may be partitioned to be non-square in shape. Syntaxdata associated with a CU may also describe, for example, partitioningof the CU into one or more TUs according to a quadtree. A TU can besquare or non-square in shape.

The HEVC standard allows for transformations according to TUs, which maybe different for different CUs. The TUs are typically sized based on thesize of PUs within a given CU defined for a partitioned LCU, althoughthis may not always be the case. The TUs are typically the same size orsmaller than the PUs. In some examples, residual samples correspondingto a CU may be subdivided into smaller units using a quadtree structureknown as “residual quad tree” (RQT). The leaf nodes of the RQT may bereferred to as transform units (TUs). Pixel difference values associatedwith the TUs may be transformed to produce transform coefficients, whichmay be quantized.

In general, a PU includes data related to the prediction process. Forexample, when the PU is intra-mode encoded, the PU may include datadescribing an intra-prediction mode for the PU. As another example, whenthe PU is inter-mode encoded, the PU may include data defining a motionvector for the PU. The data defining the motion vector for a PU maydescribe, for example, a horizontal component of the motion vector, avertical component of the motion vector, a resolution for the motionvector (e.g., one-quarter pixel precision or one-eighth pixelprecision), a reference picture to which the motion vector points,and/or a reference picture list (e.g., List 0, List 1, or List C) forthe motion vector.

In general, a TU is used for the transform and quantization processes. Agiven CU having one or more PUs may also include one or more transformunits (TUs). Following prediction, video encoder 20 may calculateresidual values from the video block identified by the coding node inaccordance with the PU. The coding node is then updated to reference theresidual values rather than the original video block. The residualvalues comprise pixel difference values that may be transformed intotransform coefficients, quantized, and scanned using the transforms andother transform information specified in the TUs to produce serializedtransform coefficients for entropy coding. The coding node may onceagain be updated to refer to these serialized transform coefficients.This disclosure typically uses the term “video block” to refer to acoding node of a CU. In some specific cases, this disclosure may alsouse the term “video block” to refer to a treeblock, i.e., LCU, or a CU,which includes a coding node and PUs and TUs.

A video sequence typically includes a series of video frames orpictures. A group of pictures (GOP) generally comprises a series of oneor more of the video pictures. A GOP may include syntax data in a headerof the GOP, a header of one or more of the pictures, or elsewhere, thatdescribes a number of pictures included in the GOP. Each slice of apicture may include slice syntax data that describes an encoding modefor the respective slice. Video encoder 20 typically operates on videoblocks within individual video slices in order to encode the video data.A video block may correspond to a coding node within a CU. The videoblocks may have fixed or varying sizes, and may differ in size accordingto a specified coding standard.

As an example, the HM supports prediction in various PU sizes. Assumingthat the size of a particular CU is 2N×2N, the HM supportsintra-prediction in PU sizes of 2N×2N or N×N, and inter-prediction insymmetric PU sizes of 2N×2N, 2N×N, N×2N, or N×N. The HM also supportsasymmetric partitioning for inter-prediction in PU sizes of 2N×nU,2N×nD, nL×2N, and nR×2N. In asymmetric partitioning, one direction of aCU is not partitioned, while the other direction is partitioned into 25%and 75%. The portion of the CU corresponding to the 25% partition isindicated by an “n” followed by an indication of “Up”, “Down,” “Left,”or “Right.” Thus, for example, “2N×nU” refers to a 2N×2N CU that ispartitioned horizontally with a 2N×0.5N PU on top and a 2N×1.5N PU onbottom.

In this disclosure, “N×N” and “N by N” may be used interchangeably torefer to the pixel dimensions of a video block in terms of vertical andhorizontal dimensions, e.g., 16×16 pixels or 16 by 16 pixels. Ingeneral, a 16×16 block will have 16 pixels in a vertical direction(y=16) and 16 pixels in a horizontal direction (x=16). Likewise, an N×Nblock generally has N pixels in a vertical direction and N pixels in ahorizontal direction, where N represents a nonnegative integer value.The pixels in a block may be arranged in rows and columns. Moreover,blocks need not necessarily have the same number of pixels in thehorizontal direction as in the vertical direction. For example, blocksmay comprise N×M pixels, where M is not necessarily equal to N.

Following intra-predictive or inter-predictive coding using the PUs of aCU, video encoder 20 may calculate residual data to which the transformsspecified by TUs of the CU are applied. The residual data may correspondto pixel differences between pixels of the unencoded picture andprediction values corresponding to the CUs. Video encoder 20 may formthe residual data for the CU, and then transform the residual data toproduce transform coefficients.

Following any transforms to produce transform coefficients, videoencoder 20 may perform quantization of the transform coefficients.Quantization generally refers to a process in which transformcoefficients are quantized to possibly reduce the amount of data used torepresent the coefficients, providing further compression. Thequantization process may reduce the bit depth associated with some orall of the coefficients. For example, an n-bit value may be rounded downto an m-bit value during quantization, where n is greater than m.

In some examples, video encoder 20 may utilize a predefined scan orderto scan the quantized transform coefficients to produce a serializedvector that can be entropy encoded. In other examples, video encoder 20may perform an adaptive scan. After scanning the quantized transformcoefficients to form a one-dimensional vector, video encoder 20 mayentropy encode the one-dimensional vector, e.g., according to contextadaptive variable length coding (CAVLC), context adaptive binaryarithmetic coding (CABAC), syntax-based context-adaptive binaryarithmetic coding (SBAC), Probability Interval Partitioning Entropy(PIPE) coding or another entropy encoding methodology. Video encoder 20may also entropy encode syntax elements associated with the encodedvideo data for use by video decoder 30 in decoding the video data.

To perform CABAC, video encoder 20 may assign a context within a contextmodel to a symbol to be transmitted. The context may relate to, forexample, whether neighboring values of the symbol are non-zero or not.To perform CAVLC, video encoder 20 may select a variable length code fora symbol to be transmitted. Codewords in VLC may be constructed suchthat relatively shorter codes correspond to more probable symbols, whilelonger codes correspond to less probable symbols. In this way, the useof VLC may achieve a bit savings over, for example, using equal-lengthcodewords for each symbol to be transmitted. The probabilitydetermination may be based on a context assigned to the symbol.

Video applications that may make use of video encoder 20 and videodecoder 30 may include local playback, streaming, broadcast/multicastand conversational applications. Conversational applications includevideo telephony and video conferencing. Conversational applications arealso referred to as low-delay applications, in that such real-timeapplications are not tolerant to significant delay. For a good userexperience, conversational applications require a relatively lowend-to-end delay of the entire systems, i.e., the delay between the timewhen a video frame is captured at a source device and the time when thevideo frame is displayed at a destination device. Typically, anacceptable end-to-end delay for conversational applications should beless than 400 ms. An end-to-end delay of around 150 ms is consideredvery good.

Each processing step of a conversational application may contribute tothe overall end-to-end delay. Example delays from processing stepsincludes capturing delay, pre-processing delay, encoding delay,transmission delay, reception buffering delay (for de-jittering),decoding delay, decoded picture output delay, post-processing delay, anddisplay delay. Typically, the codec delay (encoding delay, decodingdelay and decoded picture output delay) is targeted to be minimized inconversational applications. In particular, the coding structure shouldensure that the pictures' decoding order and output order are identicalsuch that the decoded picture output delay is equal to or close to zero.

Video coding standards typically include a specification of a videobuffering model. In AVC and HEVC, the buffering model is referred to asa hypothetical reference decoder (HRD), which includes a buffering modelof both the coded picture buffer (CPB) and the decoded picture buffer(DPB). A CPB is a first-in first-out buffer containing coded picturesfor decoding. A DPB is a buffer holding decoded pictures for use inreference (e.g., inter-prediction), output reordering, output delay, andeventual display. The CPB and DPB behaviors are mathematically specifiedby the HRD. The HRD directly imposes constraints on different timing,buffer sizes and bit rates, and indirectly imposes constraints onbitstream characteristics and statistics. A complete set of HRDparameters includes five basic parameters: initial CPB removal delay,CPB size, bit rate, initial DPB output delay, and DPB size.

In AVC and HEVC, bitstream conformance and decoder conformance arespecified as parts of the HRD specification. Though the HRD is referredto as a decoder, some techniques specified by the HRD are also typicallyneeded at the encoder side to guarantee bitstream conformance, whiletypically not needed at the decoder side. Two types of bitstream or HRDconformance, namely Type I and Type II, are specified. Also, two typesof decoder conformance (i.e., output timing decoder conformance andoutput order decoder conformance) are specified.

A Type I bitstream, is a network abstraction layer (NAL) unit streamcontaining only the video coding layer (VCL) NAL units and NAL unitswith nal_unit_type equal to FD_NUT (filler data NAL units) for allaccess units in the bitstream. A Type II bitstream, contains, inaddition to the VCL NAL units and filler data NAL units for all accessunits in the bitstream, at least one of the following: additionalnon-VCL NAL units other than filler data NAL units, all leading zero 8bits, zero byte, start_code_prefix_one_3 bytes, and trailing_zero_8 bitssyntax elements that form a byte stream from the NAL unit stream.

FIG. 2 is a block diagram illustrating a buffer model for a hypotheticalreference decoder (HRD). The HRD operates as follows. Data associatedwith decoding units that flow into CPB 102 according to a specifiedarrival schedule are delivered by the hypothetical stream scheduler(HSS) 100. The streams delivered by HSS 100 may be Type I or Type IIbitstreams, as defined above. The data associated with each decodingunit are removed and decoded by decoding process 104 (e.g., by videodecoder 30) at the CPB removal time of the decoding unit. Decodingprocess 104 is performed by video decoder 30. Each decoded pictureproduced by decoding process 104 is placed in DPB 106. The decodedpictures may be used as reference pictures during decoding process 104(e.g., during inter-prediction). A decoded picture is removed from DPB106 when it becomes no longer needed for inter-prediction reference andno longer needed for output. In some examples, decoded pictures in DPB106 may be cropped by output cropping unit 108 before being displayed.Output cropping unit 108 may be part of video decoder 30 or may be partof external processor (e.g., a display processor) configured to furtherprocess the output of a video decoder.

In the AVC and HEVC HRD models, decoding or CPB removal is access unit(AU) based, and it is assumed that picture decoding is instantaneous(e.g., decoding process 104 in FIG. 2 is assumed to be instantaneous).An access unit is a set of network abstract layer (NAL) units andcontains one coded picture. In practical applications, if a conformingdecoder strictly follows the decoding times signaled, e.g., in picturetiming supplemental enhancement information (SEI) messages generated byvideo encoder 20, to start decoding of AUs, then the earliest possibletime to output a particular decoded picture is equal to the decodingtime of that particular picture (i.e., the time when a picture starts tobe decoded) plus the time needed for decoding that particular picture.The time needed for decoding a picture in the real-world cannot be equalto zero.

HEVC WD9 includes the support of sub-picture based CPB operations toenable reduced codec delay, sometimes referred to as ultra-low delay.The CPB may operate at either the AU level (i.e., picture level) orsub-picture level (i.e., less than an entire picture), depending onwhether sub-picture level CPB operation is preferred by a decoder (whichmay be specified by an external means not specified in the HEVCspecification) and whether sub-picture CPB parameters are present (inthe bitstream or through external means not specified in the HEVC spec).When both conditions are true, the CPB operates at a sub-picture level(and in this case each decoding unit is defined as a subset of an AU). Adecoding unit (DU) is the unit operated on by the decoder. Otherwise,the CPB operates at AU level (and in this case each decoding unit isdefined as an AU). A DU is equal to an AU if the syntax elementSubPicCpbFlag is equal to 0. The DU is a subset of an AU otherwise.

HEVC syntax for sub-picture level CPB parameters includes the following:

-   -   The following syntax is in the video usability information (VUI)        part of the sequence parameter set (SPS)        -   Whether sub-picture level CPB parameters are present        -   A tick divisor, for derivation of the sub tick clock        -   CPB removal delay length        -   Whether decoding unit CPB removal delay values are signaled            in picture timing SEI messages or decoding unit information            SEI messages        -   Length of CPB size values for CPB operations at sub-picture            level    -   The following syntax is in buffering period SEI messages        -   A set of initial CPB removal delay and delay offset for            sub-picture level CPB operations    -   The following syntax is in picture timing SEI messages        -   The number of decoding units in an access unit        -   The number of NAL units in each decoding unit        -   The decoding unit CPB removal delay values for the decoding            units    -   The following syntax is in picture timing SEI messages        -   The index of each decoding unit to the list of decoding            units in an access unit        -   The decoding unit CPB removal delay value for each decoding            unit

Current approaches to minimizing CPB and/or DPB delay time exhibit thefollowing drawbacks. The output time of a decoded picture is equal tothe decoding time (i.e., CPB removal time) of the last DU (i.e., the AUitself for AU-level operation) plus the signaled DPB output delay. Thus,two approaches to reduce the delay are generally used. One is to shiftthe decoding time earlier. The other is to reduce the value of thesignaled DPB output delay (relative to the CPB removal time). However,existing solutions for an ultra-low delay buffering model only involvesub-picture based CPB operations, and only take advantage of the firstapproach to reduce the delay.

In view of these drawbacks, this disclosure proposes techniques forfurther reducing decoding delay through the signaling and use of reducedvalues of the signaled DPB output delay relative to the CPB removaltime.

Specifically, in one example of the disclosure, one additional signaledvalue of DPB output delay relative to the CPB removal time of each AU issignaled by an encoder, e.g., in the picture timing SEI message. Thisadditional signaled DPB output delay is used in the derivation of theDPB output time for sub-picture based HRD operations. In anotherexample, in addition to the additionally signaled DPB output delay, DPBoutput times are derived using a sub tick clock instead of the tickclock.

Some detailed examples are provided in below. If not specificallymentioned, the aspects of the following examples may operate as definedin HEVC WD9.

An example syntax and semantics of the picture timing SEI message,according to one example of this disclosure are as follows. Syntaxelements altered or introduced by this disclosure are shown in bold.

pic_timing( payloadSize ) { Descriptor  if(frame_field_info_present_flag ) {   pic_struct u(4)  progressive_source_idc u(2)   duplicate_flag u(1)  } au_cpb_removal_delay_minus1 u(v)  pic_dpb_output_delay u(v)  if(sub_pic_cpb_params_present_flag )   pic_dpb_outupt_du_delay u(v)  if(sub_pic_cpb_params_present_flag &&  sub_pic_cpb_params_in_pic_timing_sei_flag ) {  num_decoding_units_minus1 ue(v)   du_common_cpb_removal_delay_flagu(1)   if( du_common_cpb_removal_delay_flag )   du_common_cpb_removal_delay_minus1 u(v)   for( i = 0; i <=num_decoding_units_minus1; i++ ) {    num_nalus_in_du_minus1[ i ] ue(v)   if( !du_common_cpb_removal_delay_flag &&     i <num_decoding_units_minus1 )     du_cpb_removal_delay_minus1[ i ] u(v)  }  } }

In this example of the disclosure, the bolded syntax elements mayoperate as follows. The sytnax element pic_dpb_output_du_delay is usedto compute the DPB output time of the picture when the HRD operates at asub-picture level (i.e., when SubPicCpbFlag is equal to 1). The syntaxelement pic_dpb_output_du_delay specifies how many sub clock ticks towait after removal of the last decoding unit in an access unit from theCPB before the decoded picture is output from the DPB.

In one example, the length of the syntax element pic_dpb_output_du_delayis given in bits by dpb_output_delay_length_minus1+1. In anotherexample, the length of the syntax element pic_dpb_output_du_delay isgiven in bits by the value of another syntax element plus 1, where,e.g., the syntax element is named dpb_output_delay_length_du_minus1 andsignaled in the VUI part of the sequence parameter set.

The output time derived from pic_dpb_output_du_delay of any picture thatis output from an output timing conforming decoder shall precede theoutput time derived from the pic_dpb_output_du_delay of all pictures inany subsequent coded video sequence in decoding order. In one example,the picture output order established by the values of this syntaxelement shall be the same order as established by the values of thesyntax element PicOrderCntVal, as is specified in HEVC WD9. The syntaxelement PicOrderCntVal indicates the picture order count (POC) of thecurrent picture. A POC value is a variable that is associated with eachpicture to be output from the DPB that indicates the position of theassociated picture in output order relative to the output orderpositions of the other pictures to be output from the DPB in the samecoded video sequence.

For pictures that are not output by the “bumping” process (i.e., theprocess by which pictures are removed from the DPB) because theyprecede, in decoding order, an instantaneous decoding refresh (IDR) orbroken link access (BLA) picture with no_output_of_prior_pics_flag equalto 1 or inferred to be equal to 1, the output times derived frompic_dpb_output_du_delay shall be increasing with increasing value ofPicOrderCntVal relative to all pictures within the same coded videosequence. The syntax element no_output_of_prior_pics_flag specifies howpreviously-decoded pictures in the DPB are treated after decoding of anIDR or a BLA picture. If no_output_of_prior_pics_flag is equal to orinferred to be 1, after decoding an IDR or BLA picture, thosepreviously-decoded pictures would not be output, but would be directlyflushed/removed from the decoded picture buffer (DPB).

The “bumping” process is invoked in the following cases.

-   -   The current picture is an IDR or a BLA picture and        no_output_of_prior_pics_flag is not equal to 1 and is not        inferred to be equal to 1.    -   The current picture is neither an IDR picture nor a BLA picture,        and the number of pictures in the DPB that are marked as “needed        for output” is greater than the maximum number of picture        allowed preceding any picture in decoding order        (sps_max_num_reorder_pics [HighestTid]).    -   The current picture is neither an IDR picture nor a BLA picture,        and the number of pictures in the DPB is equal to the maximum        required size of the DPB in unit of picture storage buffers        (sps_max_dec_pic_buffering[HighestTid]).

The “bumping” process includes the following ordered steps:

-   -   1. The picture that is first for output is selected as the one        having the smallest value of PicOrderCntVal of all pictures in        the DPB marked as “needed for output.”    -   2. The picture is cropped, using the conformance cropping window        specified in the active sequence parameter set for the picture,        the cropped picture is output, and the picture is marked as “not        needed for output.”    -   3. If the picture storage buffer that included the picture that        was cropped and output contains a picture marked as “unused for        reference”, the picture storage buffer is emptied. That is, if a        picture has been output for display and is no longer needed for        inter-prediction it may be “bumped,” i.e., removed from the DPB.

For any two pictures in the coded video sequence, the difference betweenthe output times of the two pictures derived under sub-picture-level HRDoperations shall be identical to the same difference derived underAU-level (i.e., picture level) HRD operations.

An example picture output process is as follows. The following happensinstantaneously at the CPB removal time of access unit n, t_(r)(n).Based on whether or not a picture n has PicOutputFlag equal to 1 (i.e.,sub-picture HRD is used), its DPB output time t_(o,dpb)(n) is derived bythe following equation:

if(!SubPicCpbFlag)// i.e. the HRD operates at AU levelt _(o,dpb)(n)=t _(r)(n)+t _(c)*pic_dpb_output_delay(n)

else// i.e. the HRD operates at sub-picture levelt _(o,dpb)(n)=t _(r)(n)+t _(c) _(_) _(sub)*pic_dpb_output_du_delay(n)where pic_dpb_output_delay(n) and pic_dpb_output_du_delay(n) are thevalues of pic_dpb_output_delay and pic_dpb_output_du_delay,respectively, specified in the picture timing SEI message associatedwith access unit n. The variable t_(c) is derived as follows and iscalled a clock tick:t _(c)=num_units_in_tick÷time_scaleThe variable t_(c) _(_) _(sub) is derived as follows and is called asub-picture clock tick:t _(c) _(_) _(sub) =t _(c)÷(tick_divisor_minus2+2)

According to an example of this disclosure, the output of the currentpicture is specified as follows:

-   -   If PicOutputFlag is equal to 1 and t_(o,dpb)(n)=t_(r)(n), the        current picture is output.    -   Otherwise, if PicOutputFlag is equal to 0, the current picture        is not output, but will be stored in the DPB, as specified by        the “bumping” process outlined above.    -   Otherwise (PicOutputFlag is equal to 1 and        t_(o,dpb)(n)>t_(r)(n)), the current picture is output later and        will be stored in the DPB (as specified by the “bumping”        process) and is output at time t_(o,dpb)(n) unless indicated not        to be output by the decoding or inference of        no_output_of_prior_pics_flag equal to 1 at a time that precedes        t_(o,dpb)(n). When output, the picture shall be cropped, using        the conformance cropping window specified in the active sequence        parameter set.

When picture n is a picture that is output and is not the last pictureof the bitstream that is output, the value of Δt_(o,dpb)(n) (i.e., theDPB output time between pictures) is defined as:Δt _(o,dpb)(n)=t _(o,dpb)(n _(n))−t _(o,dpb)(n)where n_(n) indicates the picture that follows picture n in output orderand has PicOutputFlag equal to 1.

FIG. 3 is a block diagram illustrating an example video encoder 20 thatmay implement the techniques described in this disclosure. Video encoder20 may perform intra- and inter-coding of video blocks within videoslices. Intra-coding relies on spatial prediction to reduce or removespatial redundancy in video within a given video frame or picture.Inter-coding relies on temporal prediction to reduce or remove temporalredundancy in video within adjacent frames or pictures of a videosequence. Intra-mode (I mode) may refer to any of several spatial basedcompression modes. Inter-modes, such as uni-directional prediction (Pmode) or bi-prediction (B mode), may refer to any of severaltemporal-based compression modes.

In the example of FIG. 3, video encoder 20 includes predictionprocessing unit 41, reference picture memory 64, summer 50, transformprocessing unit 52, quantization unit 54, and entropy encoding unit 56.Prediction processing unit 41 includes motion estimation unit 42, motioncompensation unit 44, and intra-prediction processing unit 46. For videoblock reconstruction, video encoder 20 also includes inversequantization unit 58, inverse transform processing unit 60, and summer62. A deblocking filter (not shown in FIG. 3) may also be included tofilter block boundaries to remove blockiness artifacts fromreconstructed video. If desired, the deblocking filter would typicallyfilter the output of summer 62. Additional loop filters (in loop or postloop) may also be used in addition to the deblocking filter.

As shown in FIG. 3, video encoder 20 receives video data, and predictionprocessing unit 41 may partition the data into video blocks. Thispartitioning may also include partitioning into slices, tiles, or otherlarger units, as wells as video block partitioning, e.g., according to aquadtree structure of LCUs and CUs. Video encoder 20 generallyillustrates the components that encode video blocks within a video sliceto be encoded. The slice may be divided into multiple video blocks (andpossibly into sets of video blocks referred to as tiles). Predictionprocessing unit 41 may select one of a plurality of possible codingmodes, such as one of a plurality of intra coding modes or one of aplurality of inter coding modes, for the current video block based onerror results (e.g., coding rate and the level of distortion).Prediction processing unit 41 may provide the resulting intra- orinter-coded block to summer 50 to generate residual block data and tosummer 62 to reconstruct the encoded block for use as a referencepicture.

Intra-prediction processing unit 46 within prediction processing unit 41may perform intra-predictive coding of the current video block relativeto one or more neighboring blocks in the same frame or slice as thecurrent block to be coded to provide spatial compression. Motionestimation unit 42 and motion compensation unit 44 within predictionprocessing unit 41 perform inter-predictive coding of the current videoblock relative to one or more predictive blocks in one or more referencepictures to provide temporal compression.

Motion estimation unit 42 may be configured to determine theinter-prediction mode for a video slice according to a predeterminedpattern for a video sequence. The predetermined pattern may designatevideo slices in the sequence as P slices, B slices or GPB slices. Motionestimation unit 42 and motion compensation unit 44 may be highlyintegrated, but are illustrated separately for conceptual purposes.Motion estimation, performed by motion estimation unit 42, is theprocess of generating motion vectors, which estimate motion for videoblocks. A motion vector, for example, may indicate the displacement of aPU of a video block within a current video frame or picture relative toa predictive block within a reference picture.

A predictive block is a block that is found to closely match the PU ofthe video block to be coded in terms of pixel difference, which may bedetermined by sum of absolute difference (SAD), sum of square difference(SSD), or other difference metrics. In some examples, video encoder 20may calculate values for sub-integer pixel positions of referencepictures stored in reference picture memory 64. For example, videoencoder 20 may interpolate values of one-quarter pixel positions,one-eighth pixel positions, or other fractional pixel positions of thereference picture. Therefore, motion estimation unit 42 may perform amotion search relative to the full pixel positions and fractional pixelpositions and output a motion vector with fractional pixel precision.

Motion estimation unit 42 calculates a motion vector for a PU of a videoblock in an inter-coded slice by comparing the position of the PU to theposition of a predictive block of a reference picture. The referencepicture may be selected from a first reference picture list (List 0) ora second reference picture list (List 1), each of which identify one ormore reference pictures stored in reference picture memory 64. Motionestimation unit 42 sends the calculated motion vector to entropyencoding unit 56 and motion compensation unit 44.

Motion compensation, performed by motion compensation unit 44, mayinvolve fetching or generating the predictive block based on the motionvector determined by motion estimation, possibly performinginterpolations to sub-pixel precision. Upon receiving the motion vectorfor the PU of the current video block, motion compensation unit 44 maylocate the predictive block to which the motion vector points in one ofthe reference picture lists. Video encoder 20 forms a residual videoblock by subtracting pixel values of the predictive block from the pixelvalues of the current video block being coded, forming pixel differencevalues. The pixel difference values form residual data for the block,and may include both luma and chroma difference components. Summer 50represents the component or components that perform this subtractionoperation. Motion compensation unit 44 may also generate syntax elementsassociated with the video blocks and the video slice for use by videodecoder 30 in decoding the video blocks of the video slice.

Intra-prediction processing unit 46 may intra-predict a current block,as an alternative to the inter-prediction performed by motion estimationunit 42 and motion compensation unit 44, as described above. Inparticular, intra-prediction processing unit 46 may determine anintra-prediction mode to use to encode a current block. In someexamples, intra-prediction processing unit 46 may encode a current blockusing various intra-prediction modes, e.g., during separate encodingpasses, and intra-prediction processing unit 46 (or mode select unit 40,in some examples) may select an appropriate intra-prediction mode to usefrom the tested modes.

For example, intra-prediction processing unit 46 may calculaterate-distortion values using a rate-distortion analysis for the varioustested intra-prediction modes, and select the intra-prediction modehaving the best rate-distortion characteristics among the tested modes.Rate-distortion analysis generally determines an amount of distortion(or error) between an encoded block and an original, unencoded blockthat was encoded to produce the encoded block, as well as a bit rate(that is, a number of bits) used to produce the encoded block.Intra-prediction processing unit 46 may calculate ratios from thedistortions and rates for the various encoded blocks to determine whichintra-prediction mode exhibits the best rate-distortion value for theblock.

In any case, after selecting an intra-prediction mode for a block,intra-prediction processing unit 46 may provide information indicativeof the selected intra-prediction mode for the block to entropy encodingunit 56. Entropy encoding unit 56 may encode the information indicatingthe selected intra-prediction mode in accordance with the techniques ofthis disclosure. Video encoder 20 may include in the transmittedbitstream configuration data, which may include a plurality ofintra-prediction mode index tables and a plurality of modifiedintra-prediction mode index tables (also referred to as codeword mappingtables), definitions of encoding contexts for various blocks, andindications of a most probable intra-prediction mode, anintra-prediction mode index table, and a modified intra-prediction modeindex table to use for each of the contexts.

After prediction processing unit 41 generates the predictive block forthe current video block via either inter-prediction or intra-prediction,video encoder 20 forms a residual video block by subtracting thepredictive block from the current video block. The residual video datain the residual block may be included in one or more TUs and applied totransform processing unit 52. Transform processing unit 52 transformsthe residual video data into residual transform coefficients using atransform, such as a discrete cosine transform (DCT) or a conceptuallysimilar transform. Transform processing unit 52 may convert the residualvideo data from a pixel domain to a transform domain, such as afrequency domain.

Transform processing unit 52 may send the resulting transformcoefficients to quantization unit 54. Quantization unit 54 quantizes thetransform coefficients to further reduce bit rate. The quantizationprocess may reduce the bit depth associated with some or all of thecoefficients. The degree of quantization may be modified by adjusting aquantization parameter. In some examples, quantization unit 54 may thenperform a scan of the matrix including the quantized transformcoefficients. Alternatively, entropy encoding unit 56 may perform thescan.

Following quantization, entropy encoding unit 56 entropy encodes thequantized transform coefficients. For example, entropy encoding unit 56may perform context adaptive variable length coding (CAVLC), contextadaptive binary arithmetic coding (CABAC), syntax-based context-adaptivebinary arithmetic coding (SBAC), probability interval partitioningentropy (PIPE) coding or another entropy encoding methodology ortechnique. Following the entropy encoding by entropy encoding unit 56,the encoded bitstream may be transmitted to video decoder 30, orarchived for later transmission or retrieval by video decoder 30.Entropy encoding unit 56 may also entropy encode the motion vectors andthe other syntax elements for the current video slice being coded.

Inverse quantization unit 58 and inverse transform processing unit 60apply inverse quantization and inverse transformation, respectively, toreconstruct the residual block in the pixel domain for later use as areference block of a reference picture. Motion compensation unit 44 maycalculate a reference block by adding the residual block to a predictiveblock of one of the reference pictures within one of the referencepicture lists. Motion compensation unit 44 may also apply one or moreinterpolation filters to the reconstructed residual block to calculatesub-integer pixel values for use in motion estimation. Summer 62 addsthe reconstructed residual block to the motion compensated predictionblock produced by motion compensation unit 44 to produce a referenceblock for storage in reference picture memory 64 (also called a decodedpicture buffer). The reference block may be used by motion estimationunit 42 and motion compensation unit 44 as a reference block tointer-predict a block in a subsequent video frame or picture.

Video encoder 20 may be configured to implement the techniques of thisdisclosure. In one example, video encoder 20 may be configured todetermine a first DPB output time using a first DPB output delay in thecase an HRD setting for a video decoder indicates operation at a picturelevel, determining a second DPB output time using a second DPB outputdelay in the case that the HRD setting for the video decoder indicatesoperation at a sub-picture level, signal the first DPB output delay andthe second DPB output delay in an encoded video bitstream. Furtherexamples of the operation of video encoder 20 in accordance with thetechniques of this disclosure will be discussed below with reference toFIG. 5.

FIG. 4 is a block diagram illustrating an example video decoder 30 thatmay implement the techniques described in this disclosure. In theexample of FIG. 4, video decoder 30 includes coded picture buffer (CPB)78, entropy decoding unit 80, prediction processing unit 81, inversequantization unit 86, inverse transform processing unit 88, summer 90,and decoded picture buffer (DPB) 92. Prediction processing unit 81includes motion compensation unit 82 and intra-prediction processingunit 84. Video decoder 30 may, in some examples, perform a decoding passgenerally reciprocal to the encoding pass described with respect tovideo encoder 20 from FIG. 3.

CPB 78 stores coded pictures from the encoded picture bitstream. In oneexample, CPB 78 is a first-in first-out buffer containing access units(AU) in decoding order. An AU is set of network abstraction layer (NAL)units that are associated with each other according to a specifiedclassification rule, are consecutive in decoding order, and containexactly one coded picture. Decoding order is the order in which picturesare decoded, and may differ from the order in which pictures aredisplayed (i.e., the display order). The operation of the CPB may bespecified by a hypothetical reference decoder (HRD), such as an HRD thatoperates according to the techniques of this disclosure.

During the decoding process, video decoder 30 receives an encoded videobitstream that represents video blocks of an encoded video slice andassociated syntax elements from video encoder 20. Entropy decoding unit80 of video decoder 30 entropy decodes the bitstream to generatequantized coefficients, motion vectors, and other syntax elements.Entropy decoding unit 80 forwards the motion vectors and other syntaxelements to prediction processing unit 81. Video decoder 30 may receivethe syntax elements at the video slice level and/or the video blocklevel.

When the video slice is coded as an intra-coded (I) slice,intra-prediction processing unit 84 of prediction processing unit 81 maygenerate prediction data for a video block of the current video slicebased on a signaled intra-prediction mode and data from previouslydecoded blocks of the current frame or picture. When the video frame iscoded as an inter-coded (i.e., B or P) slice, motion compensation unit82 of prediction processing unit 81 produces predictive blocks for avideo block of the current video slice based on the motion vectors andother syntax elements received from entropy decoding unit 80. Thepredictive blocks may be produced from one of the reference pictureswithin one of the reference picture lists. Video decoder 30 mayconstruct the reference frame lists, List 0 and List 1, using defaultconstruction techniques based on reference pictures stored in DPB 92.

Motion compensation unit 82 determines prediction information for avideo block of the current video slice by parsing the motion vectors andother syntax elements, and uses the prediction information to producethe predictive blocks for the current video block being decoded. Forexample, motion compensation unit 82 uses some of the received syntaxelements to determine a prediction mode (e.g., intra- orinter-prediction) used to code the video blocks of the video slice, aninter-prediction slice type (e.g., B slice or P slice), constructioninformation for one or more of the reference picture lists for theslice, motion vectors for each inter-encoded video block of the slice,inter-prediction status for each inter-coded video block of the slice,and other information to decode the video blocks in the current videoslice.

Motion compensation unit 82 may also perform interpolation based oninterpolation filters. Motion compensation unit 82 may use interpolationfilters as used by video encoder 20 during encoding of the video blocksto calculate interpolated values for sub-integer pixels of referenceblocks. In this case, motion compensation unit 82 may determine theinterpolation filters used by video encoder 20 from the received syntaxelements and use the interpolation filters to produce predictive blocks.

Inverse quantization unit 86 inverse quantizes, i.e., de-quantizes, thequantized transform coefficients provided in the bitstream and decodedby entropy decoding unit 80. The inverse quantization process mayinclude use of a quantization parameter calculated by video encoder 20for each video block in the video slice to determine a degree ofquantization and, likewise, a degree of inverse quantization that shouldbe applied. Inverse transform processing unit 88 applies an inversetransform, e.g., an inverse DCT, an inverse integer transform, or aconceptually similar inverse transform process, to the transformcoefficients in order to produce residual blocks in the pixel domain.

After motion compensation unit 82 generates the predictive block for thecurrent video block based on the motion vectors and other syntaxelements, video decoder 30 forms a decoded video block by summing theresidual blocks from inverse transform processing unit 88 with thecorresponding predictive blocks generated by motion compensation unit82. Summer 90 represents the component or components that perform thissummation operation. If desired, a deblocking filter may also be appliedto filter the decoded blocks in order to remove blockiness artifacts.Other loop filters (either in the coding loop or after the coding loop)may also be used to smooth pixel transitions, or otherwise improve thevideo quality. The decoded video blocks in a given frame or picture arethen stored in DPB 92, which stores reference pictures used forsubsequent motion compensation. DPB 92 also stores decoded video forlater presentation on a display device, such as display device 32 ofFIG. 1. Like CPB 78, in one example, the operation of DPB 92 may bespecified by the HRD, as defined by the techniques of this disclosure.

Video decoder 30 may be configured to implement the techniques of thisdisclosure. In one example, video decoder 30 may be configured toreceive a first DPB output delay and a second DPB output delay for adecoded picture, determine, for the decoded picture, a first DPB outputtime using the first DPB output delay in the case an HRD setting for avideo decoder indicates operation at a picture level, and determine, forthe decoded picture, a second DPB output time using the second DPBoutput delay in the case that the HRD setting for the video decoderindicates operation at a sub-picture level. Further examples of theoperation of video decoder 30 in accordance with the techniques of thisdisclosure will be discussed below with reference to FIG. 6.

FIG. 5 is a flowchart showing an example encoding method according tothe techniques of this disclosure. The techniques of FIG. 5 may beimplemented by one or more structures of video encoder 20.

In one example, video encoder 20 may be configured to determine a firstDPB output time using a first DPB output delay in the case an HRDsetting for a video decoder indicates operation at a picture level(500), and determine a second DPB output time using a second DPB outputdelay in the case that the HRD setting for the video decoder indicatesoperation at a sub-picture level (502). Video encoder 20 may be furtherconfigured to signal the first DPB output delay and the second DPBoutput delay in an encoded video bitstream (504).

Video encoder 20 may be further configured to signal a sub-picture CPBflag that indicates whether the HRD setting for the video decoder is atthe picture level or at the sub-picture level (506), and encode videopictures based on the sub-picture CPB flag (508).

In one example of the disclosure, determining the second DPB output timecomprises multiplying the second DPB output delay by a sub-picture clocktick and adding a resultant value to a CPB removal time. In anotherexample of the disclosure, determining the first DPB output timecomprises multiplying the first DPB output delay by a clock tick andadding a resultant value to a CPB removal time.

FIG. 6 is a flowchart showing an example decoding method according tothe techniques of this disclosure. The techniques of FIG. 6 may beimplemented by one or more structures of video decoder 30.

In one example, video decoder 30 may be configured to receive asub-picture CPB flag that indicates whether the HRD setting for thevideo decoder is at a picture level or at a sub-picture level (600), anddecode video pictures based on the sub-picture CPB flag (602).

Video decoder 30 may be further configured to receive a first DPB outputdelay and a second DPB output delay for a decoded picture (604), anddetermine, for the decoded picture, a first DPB output time using thefirst DPB output delay in the case the HRD setting for the video decoderindicates operation at a picture level (606), and determine, for thedecoded picture, a second DPB output time using the second DPB outputdelay in the case that the HRD setting for the video decoder indicatesoperation at a sub-picture level (608).

Video decoder 30 may be further configured to output pictures from adecoded picture buffer based on the first DPB output time or the secondDPB output time based on the HRD setting (610). The first DPB outputtime is used if the sub-picture CPB flag indicates that the HRD settingfor the video decoder indicates operation at the picture level, and thesecond DPB output time is used if the sub-picture CPB flag indicatesthat the HRD setting for the video decoder indicates operation at thesub-picture level.

In another example of the disclosure, video decoder 30 is configured todetermine the second DPB output time by multiplying the second DPBoutput delay by a sub-picture clock tick and adding a resultant value toa CPB removal time. In another example of the disclosure, video decoder30 is configured to determine the first DPB output time by multiplyingthe first DPB output delay by a clock tick and adding a resultant valueto a CPB removal time.

In one or more examples, the functions described may be implemented inhardware, software, firmware, or any combination thereof. If implementedin software, the functions may be stored on or transmitted over, as oneor more instructions or code, a computer-readable medium and executed bya hardware-based processing unit. Computer-readable media may includecomputer-readable storage media, which corresponds to a tangible mediumsuch as data storage media, or communication media including any mediumthat facilitates transfer of a computer program from one place toanother, e.g., according to a communication protocol. In this manner,computer-readable media generally may correspond to (1) tangiblecomputer-readable storage media which is non-transitory or (2) acommunication medium such as a signal or carrier wave. Data storagemedia may be any available media that can be accessed by one or morecomputers or one or more processors to retrieve instructions, codeand/or data structures for implementation of the techniques described inthis disclosure. A computer program product may include acomputer-readable medium.

By way of example, and not limitation, such computer-readable storagemedia can comprise RAM, ROM, EEPROM, CD-ROM or other optical diskstorage, magnetic disk storage, or other magnetic storage devices, flashmemory, or any other medium that can be used to store desired programcode in the form of instructions or data structures and that can beaccessed by a computer. Also, any connection is properly termed acomputer-readable medium. For example, if instructions are transmittedfrom a website, server, or other remote source using a coaxial cable,fiber optic cable, twisted pair, digital subscriber line (DSL), orwireless technologies such as infrared, radio, and microwave, then thecoaxial cable, fiber optic cable, twisted pair, DSL, or wirelesstechnologies such as infrared, radio, and microwave are included in thedefinition of medium. It should be understood, however, thatcomputer-readable storage media and data storage media do not includeconnections, carrier waves, signals, or other transient media, but areinstead directed to non-transient, tangible storage media. Disk anddisc, as used herein, includes compact disc (CD), laser disc, opticaldisc, digital versatile disc (DVD), floppy disk and Blu-ray disc, wheredisks usually reproduce data magnetically, while discs reproduce dataoptically with lasers. Combinations of the above should also be includedwithin the scope of computer-readable media.

Instructions may be executed by one or more processors, such as one ormore digital signal processors (DSPs), general purpose microprocessors,application specific integrated circuits (ASICs), field programmablelogic arrays (FPGAs), or other equivalent integrated or discrete logiccircuitry. Accordingly, the term “processor,” as used herein may referto any of the foregoing structure or any other structure suitable forimplementation of the techniques described herein. In addition, in someaspects, the functionality described herein may be provided withindedicated hardware and/or software modules configured for encoding anddecoding, or incorporated in a combined codec. Also, the techniquescould be fully implemented in one or more circuits or logic elements.

The techniques of this disclosure may be implemented in a wide varietyof devices or apparatuses, including a wireless handset, an integratedcircuit (IC) or a set of ICs (e.g., a chip set). Various components,modules, or units are described in this disclosure to emphasizefunctional aspects of devices configured to perform the disclosedtechniques, but do not necessarily require realization by differenthardware units. Rather, as described above, various units may becombined in a codec hardware unit or provided by a collection ofinteroperative hardware units, including one or more processors asdescribed above, in conjunction with suitable software and/or firmware.

Various examples have been described. These and other examples arewithin the scope of the following claims.

What is claimed is:
 1. A method of decoding video, the methodcomprising: receiving a first decoded picture buffer (DPB) output delayand a second DPB output delay; determining whether a hypotheticalreference decoder (HRD) operates at an access unit level or operates ata sub-picture level; and based on a determination that the HRD operatesat the access unit level, determining, by a video decoding device, for adecoded picture, a first DPB output time based on the first DPB outputdelay and a picture clock tick, or based on a determination that the HRDoperates at the sub-picture level: deriving a sub-picture clock tickbased on the picture clock tick and a tick divisor value; anddetermining, by the video decoding device, for the decoded picture, asecond DPB output time based on the second DPB output delay and thesub-picture clock tick.
 2. The method of claim 1, further comprising:processing a sub-picture coded picture buffer (CPB) flag that indicateswhether the HRD operates at the access unit level or operates at thesub-picture level; and outputting the decoded picture from a DPB basedon the first DPB output time or the second DPB output time, wherein thedecoded picture is output based on the first DPB output time based onthe sub-picture CPB flag indicating that the HRD operates at the accessunit level, or the decoded picture is output based on the second DPBoutput time based on the sub-picture CPB flag indicating that the HRDoperates at the sub-picture level.
 3. The method of claim 1, whereindetermining the second DPB output time comprises multiplying the secondDPB output delay by a value corresponding to the sub-picture clock tickto obtain a product and adding the product to a CPB removal time.
 4. Themethod of claim 1, wherein determining the first DPB output timecomprises multiplying the first DPB output delay by a valuecorresponding to the picture clock tick to obtain a product and addingthe product to a CPB removal time.
 5. A method of encoding video, themethod comprising: signaling, in a picture timing supplementalenhancement information (SEI) message associated with an access unit, afirst decoded picture buffer (DPB) output delay, the first DPB outputdelay being indicative of a number of picture clock ticks to wait afterremoval of a last decoding unit in the access unit from a coded picturebuffer (CPB) before a decoded picture is output from a DPB; signaling apresence of sub-picture level parameters in the bitstream; signaling atick divisor value; and in response to signaling the presence ofsub-picture level parameters, signaling, in the picture timing SEImessage associated with the access unit a second DPB output delay, thesecond DPB output delay being indicative of a number of sub-pictureclock ticks to wait after removal of the last decoding unit in theaccess unit from the CPB before the decoded picture is output from theDPB, sub-picture clock ticks being derived based on picture clock ticks.6. The method of claim 5, further comprising: determining a first DPBoutput time, wherein determining the first DPB output time comprisesmultiplying the first DPB output delay by a value corresponding to thepicture clock tick to obtain a product and adding the product to a CPBremoval time; and determining a second DPB output time, whereindetermining the second DPB output time comprises multiplying the secondDPB output delay by a value corresponding to the sub-picture clock tickto obtain a product and adding the product to a CPB removal time.
 7. Anapparatus configured to decode video data, the apparatus comprising: amemory configured to store the video data; and a video decoder incommunication with the memory, the video decoder configured to: receivea first decoded picture buffer (DPB) output delay and a second DPBoutput delay; determine whether a hypothetical reference decoder (HRD)operates at an access unit level or operates at a sub-picture level; andbased on a determination that the HRD operates at the access unit level,determine, for a decoded picture, a first DPB output time based on thefirst DPB output delay and a picture clock tick, and based on adetermination that the HRD operates at the sub-picture level: derive asub-picture clock tick based on the picture clock tick and a tickdivisor value; and determine, for the decoded picture, a second DPBoutput time based on the second DPB output delay and the sub-pictureclock tick.
 8. The apparatus of claim 7, wherein the video decoder isfurther configured to: process a sub-picture coded picture buffer (CPB)flag that indicates whether the HRD operates at the access unit level oroperates at the sub-picture level; and output the decoded picture from aDPB based on the first DPB output time or the second DPB output time,wherein the decoded picture is output based on the first DPB output timebased on the sub-picture CPB flag indicating that the HRD operates atthe access unit level, or the decoded picture is output based on thesecond DPB output time based on the sub-picture CPB flag indicating thatthe HRD operates at the sub-picture level.
 9. The apparatus of claim 7,wherein the video decoder is configured to determine the second DPBoutput time by multiplying the second DPB output delay by a valuecorresponding to the sub-picture clock tick to obtain a product andadding the product to a CPB removal time.
 10. The apparatus of claim 7,wherein the video decoder is configured to determine the first DPBoutput time by multiplying the first DPB output delay by a valuecorresponding to the picture clock tick to obtain a product and addingthe product to a CPB removal time.
 11. An apparatus configured to encodevideo data, the apparatus comprising: a memory configured to store thevideo data; and a video encoder in communication with the memory, thevideo encoder configured to: signal, in a picture timing supplementalenhancement information (SEI) message associated with an access unit, afirst decoded picture buffer (DPB) output time delay, the first DPBoutput delay being indicative of a number of picture clock ticks to waitafter removal of a last decoding unit in the access unit from a codedpicture buffer (CPB) before a decoded picture is output from a DPB;signal a presence of sub-picture level parameters in the bitstream;signal a tick divisor value; and in response to signaling the presenceof sub-picture level parameters, signal, in the picture timing SEImessage associated with the access unit, a second DPB output delay, thesecond DPB output delay being indicative of a number of sub-pictureclock ticks to wait after removal of the last decoding unit in theaccess unit from the CPB before the decoded picture is output from theDPB, sub-picture clock ticks being derived based on picture clock ticks.12. The apparatus of claim 11, wherein the video encoder is furtherconfigured to: determine the first DPB output time by multiplying thefirst DPB output delay by a value corresponding to the picture clocktick to obtain a first DPB output product and adding the first DPBoutput product to a CPB removal time; and determine the second DPBoutput time by multiplying the second DPB output delay by a valuecorresponding to the sub-picture clock tick to obtain a second DPBoutput product and adding the second DPB output product to a CPB removaltime.
 13. An apparatus configured to decode video data, the apparatuscomprising: means for receiving a first decoded picture buffer (DPB)output delay and a second DPB output delay; indication means fordetermining whether a hypothetical reference decoder (HRD) operates atan access unit level or operates at a sub-picture level; means, operablewhen the indication means determines operation of the HRD at the accessunit level, for determining, for a decoded picture, a first DPB outputtime based on the first DPB output delay and a picture clock tick; meansfor deriving a sub-picture clock tick based on the picture clock tickand a tick divisor value; and means, operable when the indication meansdetermines operation of the HRD at the sub-picture level, fordetermining, for the decoded picture, a second DPB output time based onthe second DPB output delay and the sub-picture clock tick.
 14. Theapparatus of claim 13, further comprising: means for processing asub-picture coded picture buffer (CPB) flag that indicates whether theHRD operates at the access unit level or operates at the sub-picturelevel; and means for outputting the decoded picture from a DPB based onthe first DPB output time or the second DPB output time, wherein thedecoded picture is output based on the first DPB output time based onthe sub-picture CPB flag indicating that the HRD operates at the accessunit level, or the decoded picture is output based on the second DPBoutput time based on the sub-picture CPB flag indicating that the HRDoperates at the sub-picture level.
 15. The apparatus of claim 13,wherein the means for determining the second DPB output time comprisesmeans for multiplying the second DPB output delay by a valuecorresponding to the sub-picture clock tick to obtain a product andadding the product to a CPB removal time.
 16. The apparatus of claim 13,wherein the means for determining the first DPB output time comprisesmeans for multiplying the first DPB output delay by a valuecorresponding to the picture clock tick to obtain a product and addingthe product to a CPB removal time.
 17. An apparatus configured to encodevideo data, the apparatus comprising: means, for signaling, in a picturetiming supplemental enhancement information (SEI) message associatedwith an access unit, a first decoded picture buffer (DPB) output delay,the first DPB output delay being indicative of a number of picture clockticks to wait after removal of a last decoding unit in the access unitfrom a coded picture buffer (CPB) before a decoded picture is outputfrom a DPB; means for signaling a presence of sub-picture levelparameters in the bitstream; means for signaling a tick divisor value;means, operable in response to signaling the presence of sub-picturelevel parameters, for signaling, in the picture timing SEI messageassociated with the access unit, a second DPB output delay, the secondDPB output delay being indicative of a number of sub-picture clock ticksto wait after removal of the last decoding unit in the access unit fromthe CPB before the decoded picture is output from the DPB, sub-pictureclock ticks being derived based on picture clock ticks.
 18. Theapparatus of claim 17, wherein the means for determining the first DPBoutput time comprises means for multiplying the first DPB output delayby a value corresponding to the picture clock tick to obtain a first DPBoutput product and adding the first DPB output product to a CPB removaltime; and wherein the means for determining the second DPB output timecomprises means for multiplying the second DPB output delay by a valuecorresponding to the sub-picture clock tick to obtain a second DPBoutput product and adding the second DPB output product to a CPB removaltime.
 19. A non-transitory computer-readable storage medium storinginstructions that, when executed, causes one or more processors of adevice configured to decode video data to: receive a first decodedpicture buffer (DPB) output delay and a second DPB output delay;determine whether a hypothetical reference decoder (HRD) operates at anaccess unit level or operates at a sub-picture level; and based on adetermination that the HRD operates at the access unit level, determine,for a decoded picture, a first DPB output time based on the first DPBoutput delay and a picture clock tick; and based on a determination thatthe HRD operates at the sub-picture level: derive a sub-picture clocktick based on the picture clock tick and a tick divisor value; anddetermine, for the decoded picture, a second DPB output time based onthe second DPB output delay and the sub-picture clock tick.
 20. Thenon-transitory computer-readable storage medium of claim 19, wherein theinstructions further cause the one or more processors to: process asub-picture coded picture buffer (CPB) flag that indicates whether theHRD operates at the access unit level or operates at the sub-picturelevel; and output the decoded picture from a DPB based on the first DPBoutput time or the second DPB output time, wherein the decoded pictureis output based on the first DPB output time based on the sub-pictureCPB flag indicating that the HRD operates at the access unit level, orthe decoded picture is output based on the second DPB output time basedon the sub-picture CPB flag indicating that the HRD operates at thesub-picture level.
 21. The non-transitory computer-readable storagemedium of claim 19, wherein the instructions further cause the one ormore processors to multiply the second DPB output delay by a valuecorresponding to the sub-picture clock tick to obtain a product and addthe product to a CPB removal time.
 22. The non-transitorycomputer-readable storage medium of claim 19, wherein the instructionsfurther cause the one or more processors to multiply the first DPBoutput delay by a value corresponding to the picture clock tick toobtain a product and add the product to a CPB removal time.
 23. Anon-transitory computer-readable storage medium storing instructionsthat, when executed, causes one or more processors of a deviceconfigured to encode video data to: signal, in a picture timingsupplemental enhancement information (SEI) message associated with anaccess unit, a first decoded picture buffer (DPB) output time delay, thefirst DPB output delay being indicative of a number of picture clockticks to wait after removal of a last decoding unit in the access unitfrom a coded picture buffer (CPB) before a decoded picture is outputfrom a DPB; signal a presence of sub-picture level parameters in thebitstream; signal a tick divisor value; and in response to signaling thepresence of sub-picture level parameters, signal, in the picture timingSEI message associated with the access unit, a second DPB output delay,the second DPB output delay being indicative of a number of sub-pictureclock ticks to wait after removal of the last decoding unit in theaccess unit from the CPB before the decoded picture is output from theDPB, sub-picture clock ticks being derived based on picture clock ticks.24. The non-transitory computer-readable storage medium of claim 23,wherein the instructions further include instructions that, whenexecuted, cause the one or more processors to determine the first DPBoutput time by multiplying the first DPB output delay by a valuecorresponding to the picture clock tick to obtain a first DPB outputproduct and adding the first DPB output product to a CPB removal time;and wherein the instructions further include instructions that, whenexecuted, cause the one or more processors to determine the second DPBoutput time by multiplying the second DPB output delay by a valuecorresponding to the sub-picture clock tick to obtain a second DPBoutput product and adding the second DPB output product to a CPB removaltime.